Discourse Segmentation by Human and Automated Means

نویسندگان

  • Rebecca J. Passonneau
  • Diane J. Litman
چکیده

The need to model the relation between discourse structure and linguistic features of utterances is almost universally acknowledged in the literature on discourse. However, there is only weak consensus on what the units of discourse structure are, or the criteria for recognizing and generating them. We present quantitative results of a two-part study using a corpus of spontaneous, narrative monologues. The first part of our paper presents a method for empirically validating multiutterance units referred to as discourse segments. We report highly significant results of segmentations performed by naive subjects, where a commonsense notion of speaker intention is the segmentation criterion. In the second part of our study, data abstracted from the subjects" segmentations serve as a target for evaluating two sets of algorithms that use utterance features to perform segmentation. On the first algorithm set, we evaluate and compare the correlation of discourse segmentation with three types of linguistic cues (referential noun phrases, cue words, and pauses). We then develop a second set using two methods: error analysis and machine learning. Testing the new algorithms on a new data set shows that when multiple sources of linguistic knowledge are used concurrently, algorithm performance improves.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Automated MR Image Segmentation System Using Multi-layer Perceptron Neural Network

Background: Brain tissue segmentation for delineation of 3D anatomical structures from magnetic resonance (MR) images can be used for neuro-degenerative disorders, characterizing morphological differences between subjects based on volumetric analysis of gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF), but only if the obtained segmentation results are correct. Due to image arti...

متن کامل

A Semi-Automated Algorithm for Segmentation of the Left Atrial Appendage Landing Zone: Application in Left Atrial Appendage Occlusion Procedures

Background: Mechanical occlusion of the Left atrial appendage (LAA) using a purpose-built device has emerged as an effective prophylactic treatment in patients with atrial fibrillation at risk of stroke and a contraindication for anticoagulation. A crucial step in procedural planning is the choice of the device size. This is currently based on the manual analysis of the “Device Landing Zone” fr...

متن کامل

Automatic Sperm Analysis in Microscopic Images of Human Semen: Segmentation Using Minimization of Information Distance

Introduction The morphologic features of human sperms are key indicators for monitoring fertility problems in men. Therefore, automated analyzing methods via microscopic videos have become the most favorite policy in infertility treatment during the last decades. Materials and Methods In the proposed method, firstly a hypothesis testing framework was defined to distinguish sperms from backgroun...

متن کامل

Diagnosis of brain tumor using PNN neural networks

Cells grow and then need a very neat method to create new cells that work properly to maintain the health of the body. When the ability to control the growth of the cells is lost, they are unconsidered and often divided without order. Exemplified cells form a tissue mass called the tumor. In fact, brain tumors are abnormal and uncontrolled cell proliferations. Segmentation methods are used in b...

متن کامل

Thoughts on Word and Sentence Segmentation in Thai

This paper discusses problems of word and sentence segmentation in Thai. Disagreements on word segmentation are caused mostly from compound words. To set a standard resource and tool of word segmentation, we suggest that only simple words and true compound words should be segmented in the process of word segmentation. Other compounds can be grouped later by the same means as multiword identific...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Linguistics

دوره 23  شماره 

صفحات  -

تاریخ انتشار 1997